Data Engineering | ifkarsyah

Projects

Data Engineering Garden — Knowledge Base

A public digital garden of data engineering notes, concepts, and guides — built with Quartz v4 and published as a static site.

Data Engineering SparkSQL

↗

Lakehouse Platform

Featured

A self-service data lakehouse built on Databricks and Delta Lake, unifying batch and streaming workloads with a single storage layer.

Data Engineering SparkDatabricksDelta Lake

↗

Blog Posts

Nov 17, 2024

Databricks Series, Part 6: ML Serving and Workflows

Batch and real-time model inference, Databricks Model Serving endpoints, and orchestrating the full ML pipeline with Databricks Workflows.

Data Engineering DatabricksDelta Lake

→

Nov 10, 2024

Databricks Series, Part 5: Machine Learning with MLflow

Tracking experiments, logging models and artifacts, comparing runs, and managing the model lifecycle with MLflow on Databricks.

Data Engineering DatabricksDelta Lake

→

Nov 3, 2024

Databricks Series, Part 4: Feature Engineering at Scale

Databricks Feature Store, FeatureEngineeringClient, FeatureLookup, training sets, and eliminating training-serving skew.

Data Engineering DatabricksDelta Lake

→

Oct 27, 2024

Databricks Series, Part 3: Data Ingestion with Auto Loader

cloudFiles format, schema inference, schema evolution, and building robust incremental ingestion pipelines on Databricks.

Data Engineering DatabricksDelta Lake

→

Oct 20, 2024

Databricks Series, Part 2: Lakehouse Architecture

Unity Catalog for governance and discovery, the medallion Bronze/Silver/Gold pattern, and Delta tables as the storage foundation.

Data Engineering DatabricksDelta Lake

→

Oct 13, 2024

Databricks Series, Part 1: Getting Started

Navigating the Databricks workspace, launching clusters, writing notebooks, and submitting your first PySpark job.

Data Engineering DatabricksDelta Lake

→

Oct 6, 2024

Databricks Series, Part 0: Overview

The lakehouse platform concept, what Databricks adds on top of Spark and Delta Lake, and how it compares to alternatives.

Data Engineering DatabricksDelta Lake

→

Feb 4, 2024

Spark Series, Part 4: Performance Tuning

Making Spark jobs fast — partitioning, shuffles, skew, caching, and the most common bottlenecks in production.

Data Engineering Spark

→

Jan 28, 2024

Spark Series, Part 3: Structured Streaming

Real-time data processing with Spark Structured Streaming — micro-batches, triggers, watermarks, and output modes.

Data Engineering Spark

→

Jan 21, 2024

Spark Series, Part 2: DataFrames and Spark SQL

The practical Spark API — working with structured data using DataFrames, schemas, and SQL queries.

Data Engineering Spark

→

Jan 14, 2024

Spark Series, Part 1: RDDs and the Execution Model

Understanding Resilient Distributed Datasets — the foundation of Spark's execution model, transformations, actions, and lazy evaluation.

Data Engineering Spark

→

Jan 7, 2024

Spark Series, Part 0: Overview

A high-level introduction to Apache Spark — what it is, why it exists, and where it fits in the modern data stack.

Data Engineering Spark

→

Aug 5, 2023

Designing a Data Platform That Doesn't Rot

Lessons from building internal data platforms: what makes them last, what kills them, and the principles I try to apply.

Data Engineering

→